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ABSTRACT 



[Abstract of the Disclosure] 

A method and apparatus for enhancing the performance of speech recognition by 

5 adaptively changing a process of determining a final, recognized word depending on a 
user's selection in a list of alternative words represented by a result of speech 
recognition. A speech recognition method comprising: (a) recognizing speech uttered 
by a user and displaying a list of alternative words to list a predetermined number of 
recognition results in a predetermined order, (b) determining whether a user's selection 

10 from the list of alternative words has been changed within a predetermined standby time, 
(c) if the user's selection has not been changed within the predetermined standby time, 
determining an alternative word from the list of the alternative words currently indicated 
by a cursor, as a final, recognized word, and (d) if the user's selection has been 
changed within the predetermined standby time, adjusting the standby time and 

15 returning to operation (b). 

[Representative Drawing] 
FIG. 3 
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SPECIFICATION 



[Title of the Invention] 
5 Method and apparatus for speech recognition 

[Brief Description of the Drawings] 

FIG. 1 is a block diagram of a speech recognition apparatus according to an 
embodiment of the present invention; 
10 FIG. 2 is a detailed block diagram of a post-processor of FIG. 1 ; 

FIG. 3 is a flowchart for explaining a process of updating an erroneous word 
pattern database (DB) by an erroneous word pattern manager of FIG. 2; 

FIG. 4 is a table showing an example of the erroneous word pattern DB of FIG. 2; 

FIG. 5 is a flowchart for explaining a process of changing the order of the 
15 arrangement of alternative words by the erroneous word pattern manager of FIG. 2; 

FIG. 6 is a flowchart for explaining a process of adjusting a standby time by a 
dexterity manager of FIG. 2; 

FIG. 7 is a flowchart for explaining a speech recognition method, according to an 
embodiment of the present invention; 
20 FIG. 8 is a flowchart for explaining a speech recognition method, according to 

another embodiment of the present invention; and 

FIG. 9 shows an example of a graphic user interface according to the present 
invention. 

25 [Detailed Description of the Invention] 
[Object of the Invention] 

[Technical Field of the Invention and Related Art prior to the Invention] 

The present invention relates to speech recognition, and more particularly, to a 
method and apparatus for enhancing the success rate of speech recognition tasks by 
30 adaptively changing a process of determining a final, recognized word depending on a 
user's selection in a list of alternative words represented by a result of speech 
recognition. 

Speech recognition refers to a technique by which a computer analyzes and 
recognizes or understands human speech. Human speech sounds have specific 
35 frequencies according to the shape of a human mouth and positions of a human tongue 
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during utterance. In other words, in speech recognition technology, human speech 
sounds are converted into electric signals and frequency characteristics of the speech 
sounds are extracted from the electric signals, in order to recognize human utterances. 
Such speech recognition technology is adopted in a wide variety of fields such as 

5 telephone dialing, control of electronic toys, language learning, control of electric home 
appliances, and so forth. 

Despite the advancement of speech recognition technology, speech recognition 
cannot yet be fully accomplished due to background noise or the like in an actual 
speech recognition environment. Thus, errors frequently occur in speech recognition 

10 tasks. In order to reduce the probability of the occurrence of such errors, there are 
employed methods of determining a final, recognized word depending on user 
confirmation or selection by requesting the user to confirm recognition results of a 
speech recognizer or by presenting the user with a list of alternative words derived from 
the recognition results of the speech recognizer. 

15 Conventional techniques associated with the above methods are disclosed in 

U.S. Patent. Nos. 4,866,778, 5,027,406, 5,884,258, 6,314,397, 6,347,296, and so on. 
U.S. Patent. No. 4,866,778 suggests a technique by which the most effectively 
searched probable alternative word is displayed and if the probable alternative word is 
wrong, the next alternative word is displayed to find the correct recognition result. 

20 According to this technique, a user must separately answer a series of YES/NO 

questions presented by a speech recognition system and cannot predict which words 
will appear in the next question, which results in its inefficiency. U.S. Patent. Nos. 
5,027,406 and 5,884,258 present a technique by which alternative words derived from 
speech recognition are arrayed and recognition results are determined depending on 

25 user's selections from the alternative words via a graphic user interface or voice. 
According to this technique, since the user must perform additional manipulations to 
select the correct alternative word in each case after he or she speaks, he or she 
experiences inconvenience and is tired of the iterative operations. U.S. Patent. No. 
6,314,397 shows a technique by which user's utterances are converted into texts based 

30 on the best recognition results and corrected through a user review during which an 
alternative word is selected from a list of alternative words derived from previously 
considered recognition results. This technique suggests a smooth speech recognition 
task. However, when the user uses a speech recognition system in real time, the user 
must create a sentence, viewing recognition results. Accordingly, it is unreasonable to 

35 input the erroneous recognition result as it is. U.S. Patent. No. 6,347,296 discloses a 
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technique by which during a series of speech recognition tasks, an indefinite recognition 
result of a specific utterance is settled by automatically selecting an alternative word 
from a list of alternative words with reference to a recognition result of a subsequent 
utterance. According to this technique, successive errors may be caused by either an 

5 indefinite recognition result of a subsequent utterance or an imperfect language model. 
As described above, according to conventional speech recognition technology, 
although a correct recognition result of user speech is obtained, an additional task such 
as user confirmation or selection must be performed at least once. In addition, when 
the user confirmation is not performed, an unlimited amount of time is taken to 

10 determine a final, recognized word. 

[Technical Goal of the Invention] 

The present invention provides a speech recognition method of determining a 
first alternative word as a final, recognized word after a predetermined standby time in a 
15 case where a user does not select an alternative word from a list of alternative words 
derived from speech recognition of the user's utterance, determining a selected 
alternative word as a final, recognized word when the user selects the alternative word, 
or determining an alternative word selected after an adjusted standby time as a final, 
recognized word. 

20 The present invention also provides an apparatus for performing the speech 

recognition method. 

According to an aspect of the present invention, there is provided a speech 
recognition method comprising: (a) recognizing speech uttered by a user and displaying 
a list of alternative words to list a predetermined number of recognition results in a 

25 predetermined order, (b) determining whether a user's selection from the list of 

alternative words has been changed within a predetermined standby time, and (c) if the 
user's selection has not been changed within the predetermined standby time, 
determining an alternative word from the list of the alternative words currently indicated 
by a cursor, as a final, recognized word. 

30 Preferably, the speech recognition method further comprises either (d) if the 

user's selection has been changed within the predetermined standby time, adjusting the 
standby time and returning to operation (b) or (d) ) if the user's selection has been 
changed within the predetermined standby time, determining an alternative word 
selected by the user as a final, recognized word. 
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According to another aspect of the present invention, there is provided a speech 
recognition apparatus comprising: a speech input unit that inputs speech uttered by a 
user; a speech recognizer that recognizes the speech input from the speech input unit 
and creates a predetermined number of alternative words to be recognized in the order 

5 of similarity; and a post-processor that displays a list of alternative words that arranges 
the predetermined number of alternative words in a predetermined order and 
determines an alternative word that a cursor currently indicates as a final, recognized 
word if a user's selection from the list of alternative words has not been changed within 
a predetermined standby time. 

10 Preferably, the post-processor comprises: a window generator that generates a 

window for a graphic user interface comprising the list of alternative words; a standby 
time setter that sets a standby time from when the window is displayed to when the 
alternative word on the list of alternative words currently indicated by the cursor is 
determined as the final, recognized word; and a final, recognized word determiner that 

15 determines a first alternative word from the list of alternative words that is currently 
indicated by the cursor as a final, recognized word if the user's selection from the list of 
alternative words has not been changed within the predetermined standby time, adjusts 
the predetermined standby time if the user's selection from the list of alternative words 
has been changed within the predetermined standby time, and determines an 

20 alternative word on the list of alternative words selected by the user as a final, 
recognized word if the user's selection has not been changed within the adjusted 
standby time. 

Preferably, the post-processor comprises: a window generator that generates a 
window for a graphic user interface comprising a list of alternative words that arranges 

25 the predetermined number of alternative words in a predetermined order; a standby 
time setter that sets a standby time from when the window is displayed to when an 
alternative word on the list of alternative words currently indicated by the cursor is 
determined as a final, recognized word; and a final, recognized word determiner that 
determines a first alternative word on the list of alternative words currently indicated by 

30 the cursor as a final, recognized word if a user's selection from the list of alternative 
words has not been changed within the standby time and determines an alternative 
word on the list of alternative words selected by the user as a final, recognized word if 
the user's selection from the list of alternative words has been changed. 

35 [Structure of the Invention] 
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Hereinafter, embodiments of the present invention will be described in detail with 
reference to the attached drawings. 

FIG. 1 is a block diagram of a speech recognition apparatus according to an 
embodiment of the present invention. Referring to FIG. 1 , the speech recognition 
5 apparatus includes a speech input unit 1 1 , a speech recognizer 1 3, and a 
post-processor 15. 

The speech input unit 1 1 includes microphones and so forth, receives speech 
from a user, removes a noise signal from the user speech, amplifies the user speech to 
a predetermined level, and transmits the user speech to the speech recognizer 13. 

10 The speech recognizer 13 detects a starting point and an ending point of the user 

speech, samples speech feature data from sound sections except soundless sections 
before and after the user speech, and vector-quantizes the speech feature data in 
real-time. Next, the speech recognizer 13 performs a viterbi search to choose the 
closest acoustic word to the user speech from words stored in a DB using the speech 

15 feature data. To this end, Hidden Markov Models (HMMs) may be used. Feature 
data of HMMs, which are built by training words, is compared with that of currently input 
speech, so as to determine the most probable candidate word. The speech recognizer 
13 completes the viterbi search, determines a predetermined number, for example, 3 of 
the closest acoustic words to currently input speech as recognition results in the order 

20 of similarity, and transmits the recognition results to the post-processor 15. 

The post-processor 15 receives the recognition results from the speech 
recognizer 13, converts the recognition results into text signals, and creates a window 
for a graphic user interface. Here, the window displays the text signals in the order of 
similarity. An example of the window is shown in FIG. 9. As shown in FIG. 9, a 

25 window 91 includes a message area 92 for displaying a message "A first alternative 
word, herein, Tarn Saek Gi', is being recognized", an area 93 for displaying a time belt, 
and an area 94 for displaying a list of alternative words. The window 91 is displayed 
on a screen until the time belt 93 corresponding to a predetermined standby time is over. 
. In addition, in a case where there is no additional user input with an alternative word 

30 selection key or button within the standby time, the first alternative word is determined 
as a final, recognized word. On the contrary, when there is an additional user input 
with the alternative word selection key or button within the standby time, a final, 
recognized word is determined through a process shown in FIG. 7 or 8 which will be 
explained later. 
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FIG. 2 is a detailed block diagram of the post-processor 15 of FIG. 1 . Referring 
to FIG. 2, the post-processor 15 includes a standby time setter 21, a dexterity manager 
22, a dexterity DB 23, a window generator 24, an erroneous word pattern manager 25, 
an erroneous word pattern DB 26, and a final, recognized word determiner 27. 

5 The standby time setter 21 sets a standby time from a point in time when the 

window 91 for the graphic user interface is displayed to a point in time when an 
alternative word currently indicated by a curser is determined as a final, recognized 
word. The standby time is represented by the time belt 93 in the window 91 . The 
standby time may be equally assigned to all of the alternative words on the list of 

10 alternative words. The standby time may also be assigned differentially to each of the 
alternative words from the most acoustically similar alternative word to the least 
acoustically similar alternative word. The standby time may be equally assigned to 
users or may be assigned differentially to the user depending on the user's dexterity. 
The standby time setter 21 provides the window generator 24 with the standby time and 

15 the recognition results input from the speech recognizer 13. 

The dexterity manager 22 adds a predetermined spare time to a selection time 
determined based on information on a user's dexterity stored in the dexterity DB 23, 
adjusts the standby time to the addition value, and provides the standby time setter 21 
with the adjusted standby time. Here, the dexterity manager 22 adjusts the standby 

20 time through a process shown FIG. 6 which will be explained later. In addition, the 
standby time may be equally assigned to all of the alternative words or may be assigned 
differentially to each of the alternative words from the most acoustically similar 
alternative word to the least acoustically similar alternative word. 

The dexterity DB 23 stores different selection times determined based on the 

25 user's dexterity. Here, 'dexterity' is a variable in inverse proportion to a selection time 
required for determining a final, recognized word after the window for the graphic user 
interface is displayed. In other words, an average value of selection times required for 
a predetermined number of times a final, recognized word is determined is determined 
as the user's dexterity. 

30 The window generator 24 generates the window 91 including the message area 

92, the time belt 93, and the alternative word list 94 as shown in FIG. 9. The message 
area 92 displays a current situation, the time belt 93 corresponds to the standby time 
set by the standby time setter 21, and the alternative word list 94 lists the recognition 
results, i.e., the alternative words, in the order of similarity. Here, the order of listing 
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the alternative words may be determined based on erroneous word patterns appearing 
in previous speech recognition history as well as the similarity. 

The erroneous word pattern manager 25 receives a recognized word determined 
as the first alternative word by the speech recognizer 13 and the final, recognized word 

5 provided by the final, recognized word determiner 27. If the erroneous word pattern 
DB 26 stores the combination of the recognized words corresponding to the first 
alternative word and the final, recognized word, the erroneous word pattern manager 25 
adjusts the recognition results supplied from the speech recognizer 13 to the window 
generator 24 and matching scores, so as to be provided to the window generator 24. 

10 The window generator 24 then changes the listing order of the alternative word list 94 
based on the adjusted matching scores. For example, if "U hui-jin" is determined as 
the first alternative word and "U ri jib" is determined as the final, recognized word, 
predetermined weight is laid on "U ri jib". As a result, although the speech recognizer 
13 determines "U hui jin" as the first alternative word, the window generator 24 can 

15 array "U ri jib" in a higher position than "U hui jin". 

When the first alternative word and the final, recognized word are different, the 
erroneous word pattern DB 26 stores the first alternative word and the final, recognized 
word as erroneous word patterns. As shown in FIG. 4, an erroneous word pattern 
table includes a first alternative word 41 resulting from speech recognition, a final, 

20 recognized word 42, first through n th user utterance features 43, an utterance propensity 
43, and a number of times errors occur, i.e., a history n 45. 

The final, recognized word determiner 27 determines the alternative word 
currently indicated by the cursor as the final, recognized word depending on whether 
the user makes an additional selection from the alternative word list 94 in the window 91 

25 within the standby time represented by the time belt 93. In other words, when the user 
does not additionally press the alternative word selection key or button within the 
standby time after the window 91 is displayed, the final, recognized word determiner 27 
determines the first alternative word currently indicated by the cursor as the final, 
recognized word. When the user presses the alternative word selection key or button 

30 within the standby time, the final, recognized word determiner 27 determines the final, 
recognized word through the process of FIG. 7 or 8. 

FIG. 3 is a flowchart for explaining a process of updating the erroneous word 
pattern DB 26 by the erroneous word pattern manager 25 of FIG. 2. Referring to FIG. 
3, in step 31 , a determination is made as to whether the erroneous word pattern DB 26 

35 stores a pair of the first alternative word and the final, recognized word provided by the 
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final, recognized word determiner 27. If in step 301 , it is determined that the erroneous 
word pattern DB 26 does not store the pair of the first alternative word and the final, 
recognized word, the process ends. 

If in step 31, it is determined that the erroneous word pattern DB 26 stores the 
5 pair of the first alternative word and the final, recognized word, in step 32, a difference 
value in utterance features is calculated. The difference value is a value obtained by 
adding absolute values of differences between the first through n th utterance features 43 
of the erroneous word patterns stored in the erroneous word pattern DB 26 and first 
through n ,h utterance features of currently input speech. 

10 In step 33, the difference value is compared with a first threshold for update. If 

in step 33, it is determined that the difference value is greater than or equal to the first 
threshold, the process ends. If in step 33, the difference value is less than the first 
threshold, i.e., if it is determined that an error occurs for the same reason for the 
previous error such as a cold, voice change in the mornings, background noise, or the 

15 like as the erroneous word pattern, in step 34, it is determined whether the first 
alternative word is equal to the corresponding final, recognized word. The first 
threshold may be set to an optimum value experimentally or through a simulation. 

In step 35, an average value of the first through n ,h utterance features of currently 
input speech is calculated to update an utterance propensity 44 and in step 36, a value 

20 of a history of a corresponding erroneous word pattern increases by 1 to update the 
history 45, if the first alternative word is different from the corresponding final, 
recognized word according to the result of determination in step 34. 

In step 37, it is determined whether a value of a history n is greater that 0 if the 
first alternative word is equal to the final, recognized word according to the result of 

25 determination in step 34. If it is determined that the value of the history n is smaller 
than or equals to 0, the process ends. In step 38, the value of the history n decreases 
by 1 to update the history 45, if it is determined that the value of the history n is greater 
than 0 according to the result of determination in step 37. 

FIG. 5 is a flowchart for explaining a process of changing an order of listing 

30 alternative words via the erroneous word pattern manager 25 of FIG. 2. 

Referring to FIG. 5, in step 51 , a determination is made as to whether the 
erroneous word pattern DB 26 stores a pair of a first alternative word and a second 
alternative word as the final, recognized word or a pair of a first alternative word and a 
third alternative word as the final, recognized word, with reference to the recognition 

35 results and matching scores of Table 1 provided to the window generator 24 via the 



speech recognizer 1 3. If in step 51 , it is determined that the erroneous word pattern 
DB 26 does not store the pair of the first alternative word and the second alternative 
word or the pair of the first alternative word and the third alternative word, the process 
ends. Here, Table 1 shows scores of a first to a third alternate words. 

[Table 1] 



Recognition Result 


Matching Scores 


Hwang Gil Du 


10 


Hong Gi Su 


9 


Hong Gil Dong 
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If in step 51, it is determined that the erroneous word pattern DB 26 stores the 
first alternative word, the second alternative word and the third alternative word, in step 
10 52, a difference value in first through n th utterance features are calculated. As 

described with reference to FIG. 3, the difference value is a value obtained by adding 
absolute values of differences between the first through n th utterance features stored in 
the erroneous word pattern DB 26 and first through n th utterance features of currently 
input speech. 

15 In step 53, the difference value is compared with a second threshold for changing 

the order of listing the alternative words. If in step 53, it is determined that the 
difference value in each pair is greater than or equal to the second threshold, i.e., an 
error does not occur for the same reason as the previous error, the process ends. The 
second threshold may be set to an optimum value experimentally or through a 

20 simulation. If in step 53, it is determined that the difference value in each pair is less 
than the second threshold, i.e., the error occurs for the same reason as the previous 
error, in step 54, a matching score of a corresponding alternative word is adjusted. For 
example, in a case where the erroneous word pattern DB 26 stores an erroneous word 
pattern table as shown in FIG. 4 and the weight is set to 0.4, the recognition results and 

25 the matching scores shown in Table 1 are changed into recognition results and scores 
shown in Table 2. Here, a changed matching score "9.2" is obtained by adding a value 
resulting from multiplication of the weight "0.4" by a history "3" to an original matching 
score "8". 

30 [Table 2] 



Recognition Result Matching Score 
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Hwang Gil Du 


10 


Hong Gi Su 


9.2 


Hong Gil Dong 
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Meanwhile, the first through n th utterance features 43 used in the processes 
shown in FIGS. 3 through 5 are information generated when the speech recognizer 13 
analyzes speech. In other words, the information may be information, a portion of 

5 which is used for determining speech recognition results and a remaining portion of 
which is used only as reference data. The information may also be measured using 
additional methods as follows. 

First, a time required for uttering a corresponding number of syllables is defined 
as an utterance speed. Next, a voice tone is defined. When the voice tone is 

10 excessively lower or higher than a microphone volume set in hardware, the voice tone 
may be the cause of an error. For example, a low-pitched voice is hidden by noise and 
a high-pitched voice is not partially received by hardware. As a result, a voice signal 
may be distorted. Third, a basic noise level, which is measured in a state when no 
voice signal is input or in a space between syllables, is defined as a signal-to-noise ratio 

15 (SNR). Finally, the change of voice is defined in a specific situation where a portion of 
voice varies due to cold or a problem with the vocal chords that occurs in the mornings. 
In addition, various other utterance features may be used. 

FIG. 6 is a flowchart for explaining a process of adjusting the standby time via the 
dexterity manager 22 of FIG. 2. 

20 Referring to FIG. 6, in step 61 , a difference value in a selection time is calculated 

by subtracting a time required for determining a final, recognized word from the 
selection time assigned for each user and stored in the dexterity DB 23. 

In step 62, the difference value is compared with a third threshold for changing 
the standby time. If in step 62, it is determined that the difference value is greater than 

25 the third threshold, i.e., a given time is longer than a time for which the user can 

determine a selection, in step 63, the selection time is modified. The third threshold 
may be set to an optimum value experimentally or through a simulation. The modified 
selection time is calculated by subtracting a value resulting from multiplication of the 
difference value by the predetermined weight from the selection time stored in the 

30 dexterity DB 23. For example, when the selection time stored in the dexterity DB 23 is 
0.8 seconds, the difference value is 0.1 seconds, and the predetermined weight is 0.1, 



the modified selection time is 0.79 seconds. The modified selection time is stored in 
the dexterity DB 23 so as to update a selection time for the user. 

If in step 62, it is determined that the difference value is less than or equal to the 
third threshold, i.e., a final user's selection is determined by a timeout of the speech 

5 recognition system after the selection time ends, in step 64, the difference value is 
compared with a predetermined spare time. If in step 64, it is determined that the 
difference value is greater than or equal to the spare time, the process ends. 

If in step 64, it is determined that the difference value is less than the spare time, 
in step 65, the selection time is modified. The modified selection time is calculated by 

10 adding a predetermined extra time to the selection time stored in the dexterity DB 23. 
For example, when the selection time stored in the dexterity DB 23 is 0.8 seconds and 
the extra time is 0.02 seconds, the modified selection time is 0.82 seconds. The 
modified selection time is stored in the dexterity DB 23 so as to update a selection time 
for a user. The extra time is to prevent a potential error from occurring in a subsequent 

15 speech recognition process, and herein, is set to 0.02 seconds. 

In step 66, a standby time for the user is calculated by adding a predetermined 
amount of extra time to the selection time modified in step 63 or 65, and the standby 
time setter 21 is informed of the calculated standby time. Here, the extra time is to 
prevent a user's selection from being determined regardless of the user's intension, and 

20 herein, is set to 0.3 seconds. 

FIG. 7 is a flowchart for explaining a speech recognition method, according to an 
embodiment of the present invention. The speech recognition method includes step 71 
of displaying a list of alternative words, steps of 72, 73, and 78 performed in a case of 
no change in a user's selection, and steps 74, 75, 76, and 77 performed in a case 

25 where a user's selection changes. 

Referring to FIG. 7, in step 71 , the window 91 including the alternative word list 
94 listing the recognition results of the speech recognizer 13 is displayed. In the 
present invention, the cursor is set to indicate the first alternative word on the alternative 
word list 94 when the window 91 is displayed. In addition, the time belt 93 starts from 

30 when the window 91 is displayed. In step 72, a determination is made as to whether 
an initial standby time set by the standby time setter 21 has elapsed without additional 
user input with the alternative word selection key or button. 

If in step 72, it is determined that the initial standby time has elapsed, in step 73, 
a first alternative word currently indicated by the cursor is determined as a final, 

35 recognized word. In step 78, a function corresponding to the final, recognized words is 
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performed. If in step 72, it is determined that the initial standby time has not elapsed, 
in step 74, a determination is made as to whether a user's selection has been changed 
by the additional user input with the alternative word selection key or button. 

If in step 74, it is determined that the user's selection has been changed, in step 

5 75, the initial standby time is reset. Here, the adjusted standby time may be equal to 
or different from the initial standby time according to an order of listing the alternative 
words. If in step 74, it is determined that the user's selection has not been changed, 
the process moves on to step 76. For example, if the user's selection is changed into 
'Tan Seong Ju Gi' shown in FIG. 9, the message area 92 of the window 91 shows a 

10 message "The Tan Seong Ju Gi' is being recognized" and concurrently, the time belt 93 
starts according to the adjusted standby time. 

In step 76, a determination is made as to whether the standby time adjusted in 
step 75 or the initial standby time has elapsed. If in step 76, it is determined that the 
adjusted standby time or the initial standby time has not elapsed, the process returns to 

15 step 74 to iteratively determine whether the user's selection has been changed. If in 
step 76, it is determined that the adjusted standby time or the initial standby time has 
elapsed, in step 77, an alternative word that the cursor currently indicates from a 
change in the user's selection is determined as a final, recognized word. In step 78, a 
function corresponding to the final, recognized word is performed. 

20 FIG. 8 is a flowchart for explaining a speech recognition method, according to 

another embodiment of the present invention. The speech recognition method 
includes step 81 of displaying a list of alternative words, steps 82, 83, and 86 performed 
in a case where there is no change in a user's selection, and steps 84, 85, and 86 
performed in a case where there is a change in a user's selection. 

25 Referring to FIG. 8, in step 81 , the window 91 including the alternative word list 

94 listing the recognition results of the speech recognizer 13 is displayed. The time 
belt 93 starts from when the window 91 is displayed. In step 82, a determination is 
made as to whether an initial standby time set by the standby time setter 21 has 
elapsed without an additional user input via the alternative word selection key or button. 

30 If in step 82, it is determined that the initial standby time has elapsed, in step 83, 

a first alternative word currently indicated by the cursor is determined as a final, 
recognized word. In step 86, a function corresponding to the final, recognized word is 
performed. If in step 82, it is determined that the initial standby time has not elapsed, 
in step 84, a determination is made as to whether a user's selection has been changed 

35 by the additional user input via the alternative word key or button. If in step 84, it is 



determined that the user's selection has been changed, in step 85, an alternative word 
that the cursor currently indicates due to the change in the user's selection is 
determined as the final, recognized word. In step 86, a function corresponding to the 
final, recognized word is performed. If in step 84, it is determined that the user's 
5 selection has not been changed, the process returns to step 82. 

Table 3 below shows comparisons of existing speech recognition methods and a 
speech recognition method of the present invention, in respect to success rates of 
speech recognition tasks and a number of times an additional task is performed in 
various recognition environments. 

10 



[Table 3] 



Suggestion 
Method of 
Alternative 
Word 


90% Recognition Environment 


70% Recognition Environment 


Additi- 
onal 
Task 

0 Time 


Additi- 
onal 
Task 

1 Time 


Additi- 
onal 
Task 

2 Time 


Total 


Additi- 
onal 
Task 

0 Time 


Additi- 
onal 
Task 

1 Time 


Additi- 
onal 
Task 

2 Time 


Total 


Existing 
method 1 


90% 


0% 


0% 


90% 


70% 


0% 


0% 


70% 


Existing 
method 2 


0% 


90% 


0% 


90% 


0% 


70% 


0% 


70% 


Existing 
method 3 


0% 


99.9% 


0% 


99.9% 


0% 


97.3% 


0% 


97.3% 


Present 
Invention 


90% 


9% 


0.9% 


99.9% 


70% 


21% 


6.3% 


97.3% 



Alternative words are suggested in existing method 1. In existing method 2, a 
user determines the best alternative word. In existing method 3, a user selects an 

15 alternative word from a list of alternative words corresponding to recognition results. 
Also, data shown in Table 1 was obtained on the assumption that 90% recognition 
environment refers to noise in an office, 70% recognition environment refers to noise 
where a car travels on a highway, and a list of alternative words to be recognized is 
infinite, the alternative words on the list of alternative words are similar. According to 

20 Table 1 , when the speech recognition method of the present invention is adopted, as 
the additional task is iteratively performed, the success rate of the speech recognition 
task is maximized. 
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The present invention can be realized as a computer-readable code on a 
computer-readable recording medium. For example, a speech recognition method can 
be accomplished as first and second programs to be recorded on a computer-readable 
recording medium. The first program includes the following tasks of: recognizing 

5 speech uttered by a user and displaying a list of alternative words to list a 

predetermined number of recognition results in a predetermined order. The second 
program includes the following tasks of: determining whether a user's selection from the 
list of alternative words has been changed within a predetermined standby time; if the 
user's selection has not been changed within the predetermined standby time, 

10 determining an alternative word from the list of the alternative words currently indicated 
by a cursor, as a final, recognized word; if the user's selection has been changed within 
the predetermined standby time, the standby time is adjusted; iteratively determining 
whether the user's selection has been changed within the adjusted standby time; and if 
the user's selection has not been changed within the adjusted standby time, 

15 determining an alternative word selected by the user as a final, recognized word. Here, 
the second program may be replaced with a program including the following tasks of: 
determining whether a user's selection from a list of alternative words has been 
changed within a predetermined standby time; if the user's selection has not been 
changed within the predetermined standby time, determining an alternative word from 

20 the list of alternative words currently indicated by a cursor, as a final, recognized word; 
and if the user's selection has been changed within the predetermined standby time, 
determining an alternative word selected by the user as a final, recognized word. 

A computer-readable medium may be any kind of recording medium in which 
computer-readable data is stored. Examples of such computer-readable media include 

25 ROMs, RAMs, CD-ROMs, magnetic tapes, floppy discs, optical data storing devices, 
and carrier waves (e.g., transmission via the Internet), and so forth. Also, the 
computer-readable code can be stored on the computer-readable media distributed in 
computers connected via a network. 

Furthermore, functional programs, codes, and code segments for realizing the 

30 present invention can be easily analogized by programmers skilled in the art. 

Moreover, a speech recognition method and apparatus according to the present 
invention can be applied to various platforms of personal mobile communication devices 
such as personal computers, portable phones, personal digital assistants (PDA), and so 
forth. As a result, success rates of speech recognition tasks can be improved. 

35 



[Effect of the Invention] 

As described above, in a speech recognition method and apparatus according to 
the present invention, a number of times a user performs an additional task and 
psychological pressure put on the user can be minimized, even in a poor speech 
recognition environment, and a final success rate of speech recognition performed via a 
voice command can be maximized. As a result, efficiency of speech recognition can 
be improved. 

In addition, when a user's selection is not changed within a predetermined 
standby time, a subsequent task can be automatically performed. Thus, a number of 
times the user manipulates a button for speech recognition can be minimized. As a 
result, since the user can easily perform speech recognition, user satisfaction of a 
speech recognition system can be increased. Moreover, a standby time can be 
adaptively adjusted to a user. Thus, a speed for performing speech recognition tasks 
can be reduced. 

While the present invention has been particularly shown and described with 
reference to exemplary embodiments thereof, it will be understood by those of ordinary 
skill in the art that various changes in form and details may be made therein without 
departing from the spirit and scope of the present invention as defined by the following 
claims. 
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What is claimed is: 

1 . A speech recognition method comprising: 

(a) recognizing speech uttered by a user and displaying a list of alternative words 
5 to list a predetermined number of recognition results in a predetermined order; 

(b) determining whether a user's selection from the list of alternative words has 
been changed within a predetermined standby time; and 

(c) if the user's selection has not been changed within the predetermined standby 
time, determining an alternative word from the list of the alternative words currently 

10 indicated by a cursor, as a final, recognized word. 

2. The speech recognition method of claim 1 , further comprising: 

(d) if the user's selection has been changed within the predetermined standby 
time, adjusting the standby time and returning to operation (b) 

15 

3. The speech recognition method of claim 1 , further comprising: 

(d) if the user's selection has been changed within the predetermined standby 
time, determining an alternative word selected by the user as a final, recognized word. 

20 4. The speech recognition method of any one of claims 1 through 3, wherein 

operation (a) comprises adjusting the predetermined order of listing the alternative 
words using a matching score of the speech uttered by the user and erroneous word 
patterns appeared in a history of speech recognition.. 

25 5. The speech recognition method of claim 4, wherein operation (a) 

comprises: 

(a1) calculating a difference value in utterance features if a predetermined 
erroneous word pattern database stores a first alternative word resulting from the 
recognition of the speech; 
30 (a2) comparing the difference value obtained in operation (a1) with a first 

threshold; 

(a3) determining whether the first alternative word is equal to the final, 
recognized word if the difference value is smaller than the first threshold according to 
the result of comparison in operation (a2); 
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(a4) calculating an average value of the utterance features of currently input 
speech to update an utterance propensity and increasing a value of a history of a 
corresponding erroneous word pattern by 1 to update the history, if the first alternative 
word is different from the final, recognized word according to the result of determination 
5 in operation (a3); and 

(a5) decreasing the value of the history of the corresponding erroneous word 
pattern by 1 to update the history, if the first alternative word is equal to the final, 
recognized word according to the result of determination in operation (a3) and the value 
of the history of the corresponding erroneous word pattern is greater than 1. 

10 

6. The speech recognition method of claim 4, wherein operation (a) 
comprises: 

(a1) calculating a difference value in utterance features if a predetermined 
erroneous word pattern database stores at least one of a pair of first and second 
1 5 alternative words and a pair of first and third alternative words derived from the 
recognition of the speech; 

(a2) comparing the difference value obtained in operation (a1) with a second 
threshold; and 

(a3) modifying a matching score of a corresponding alternative word if the 
20 difference value is smaller than the second threshold according to the result of 
comparison in operation (a2). 

7. The speech recognition method of claim 6, wherein the modified matching 
score in operation (a3) is calculated by adding a value resulting from multiplication of an 

25 original matching score by a history value of the corresponding alternative word by a 
predetermined weight. 

8. The speech recognition method of any one of claims 1 through 3, further 
comprising (d) adjusting the standby time according to user dexterity before operation 

30 (b). 

9. The speech recognition method of claim 8, wherein operation (d) 
comprises: 
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(d1) calculating a difference value in a selection time by subtracting a time for 
determining a final, recognized word from a selection time assigned for each user and 
stored in a dexterity database; 

(d2) comparing the difference value in the selection time obtained in operation 
5 (d 1 ) with a third threshold ; 

(d3) modifying the selection time if the difference value in the selection time is 
greater than the third threshold, according to the result of comparison in operation (d2); 

(d4) comparing the difference value in the selection time with a predetermined 
spare time if the difference value in the selection time is less than or equal to the third 
10 threshold, according to the result of comparison in operation (d2); 

(d5) modifying the selection time if the difference value in the selection time is 
less than the spare time according to the result of comparison in operation (d4); and 

(d6) calculating a standby time of a corresponding user by adding a 
predetermined extra time to the modified selection time in operation (d3) or (d5). 

15 

10. The speech recognition method of claim 9, wherein in operation (d3), the 
selection time is modified by subtracting a value resulting from multiplication of the 
difference value in the selection time by a predetermined weight from the selection time 
stored in the dexterity database. 

20 

1 1 . The speech recognition method of claim 9, wherein in operation (d5), the 
selection time is modified by adding a predetermined extra time to the selection time 
stored in the dexterity database. 

25 12. The speech recognition method of any one of claims 1 through 3, wherein 

the standby time is equally assigned to all of the alternative words on the list of 
alternative words. 

13. The speech recognition method of any one of claims 1 through 3, wherein 
30 the standby time is assigned differentially to each of the alternative words on the list of 

alternative words according to the predetermined order of listing the alternative words. 

14. A computer-readable recording medium comprising: 



a first program that recognizes speech uttered by a user and displays a list of 
alternative words to list a predetermined number of alternative words derived from the 
recognition of the speech in a predetermined order; and 

a second program that determines whether a user's selection from the list of 
alternative words has been changed within a predetermined standby time and 
determines an alternative word on the list of alternative words that a cursor currently 
indicates, as a final, recognized words, if the user's selection has not been changed. 

15. The computer-readable recording medium of claim 14, wherein the 
second program further comprises: 

adjusting the standby time if the user's selection has been changed within the 
predetermined standby time; 

determining whether the user's selection has been changed within the adjusted 
standby time; and 

determining an alternative word on the list of alternative words selected by the 
user as a final, recognized word if it is determined that the user's selection has not been 
changed within the adjusted standby time. 

16. The computer-readable recording medium of claim 14, wherein the 
second program further comprises: 

determining an alternative word on the list of alternative words selected by the 
user as a final, recognized word if it is determined that the user's selection has been 
changed within the predetermined standby time. 

17. A speech recognition apparatus comprising: 

a speech input unit that inputs speech uttered by a user; 

a speech recognizer that recognizes the speech input from the speech input unit 
using a predetermine speech recognition algorithm and creates a predetermined 
number of alternative words to be recognized in the order of similarity; and 

a post-processor that displays a list of alternative words that arranges the 
predetermined number of alternative words in a predetermined order and determines an 
alternative word that a cursor currently indicates as a final, recognized word if a user's 
selection from the list of alternative words has not been changed within a predetermined 
standby time. 
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18. The speech recognition apparatus of claim 17, wherein the post-processor 
comprises: 

a window generator that generates a window for a graphic user interface 
comprising the list of alternative words; 

5 a standby time setter that sets a standby time from when the window is displayed 

to when the alternative word on the list of alternative words currently indicated by the 
cursor is determined as the final, recognized word; and 

a final, recognized word determiner that determines a first alternative word from 
the list of alternative words that is currently indicated by the cursor as a final, recognized 

10 word if the user's selection from the list of alternative words has not been changed 
within the predetermined standby time, adjusts the predetermined standby time if the 
user's selection from the list of alternative words has been changed within the 
predetermined standby time, and determines an alternative word on the list of 
alternative words selected by the user as a final, recognized word if the user's selection 

15 has not been changed within the adjusted standby time. 

19. The speech recognition apparatus of claim 17, wherein the post-processor 
comprises: 

a window generator that generates a window for a graphic user interface 
20 comprising a list of alternative words that arranges the predetermined number of 
alternative words in a predetermined order; 

a standby time setter that sets a standby time from when the window is displayed 
to when an alternative word on the list of alternative words currently indicated by the 
cursor is determined as a final, recognized word; and 
25 a final, recognized word determiner that determines a first alternative word on the 

list of alternative words currently indicated by the cursor as a final, recognized word if a 
user's selection from the list of alternative words has not been changed within the 
standby time and determines an alternative word on the list of alternative words 
selected by the user as a final, recognized word if the user's selection from the list of 
30 alternative words has been changed. 

20. The speech recognition apparatus of claim 18 or 19, wherein the 
post-processor further comprises: 

an erroneous word pattern database that turns a recognized word determined as 
35 a first alternative word by the speech recognizer, a final, recognized word provided by 
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the final, recognized word determiner, at least one user utterance feature, a utterance 
propensity, and a history into data; and 

an erroneous word pattern manager that receives-recognition results and scores 
from the speech recognizer, adjusts a score of a recognized word corresponding to an 
5 erroneous word pattern, and informs the window generator of the adjusted score so as 
to change the order of listing the alternative words. 

21 . The speech recognition apparatus of claim 1 8 or 1 9, wherein the 
post-processor further comprises: 

10 a dexterity database that turns a selection time on the user's dexterity into data; 

and 

a dexterity manager that adjusts the standby time to a value obtained by adding 
a predetermined spare time to the selection time stored in the dexterity database and 
informs the standby time setter of the adjusted standby time. 

15 

22. The speech recognition apparatus of claim 20, wherein the post-processor 
further comprises: 

a dexterity database that turns a selection time on the user's dexterity into data; 

and 

20 a dexterity manager that adjusts the standby time to a value obtained by adding 

a predetermined spare time to the selection time stored in the dexterity database and 
informs the standby time setter of the adjusted standby time. 

23. The speech recognition apparatus of claim 18 or 19, wherein the standby 
25 time is determined depending on the dexterity of the user. 

24. The speech recognition apparatus of claim 18, wherein the adjusted 
standby time is equally assigned to all of the alternative words on the list of alternative 
words. 

30 

25. The speech recognition apparatus of claim 18, wherein the adjusted 
standby time is assigned differentially to each the alternative words on the list of 
alternative words according to an order of listing the alternative words on the list of 
alternative words. 
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